Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Digitizing a Million Books: Challenges for Document Analysis

Identifieur interne : 001088 ( Main/Exploration ); précédent : 001087; suivant : 001089

Digitizing a Million Books: Challenges for Document Analysis

Auteurs : Pramod Sankar [Inde] ; Vamshi Ambati [États-Unis] ; Lakshmi Pratha [Inde] ; V. Jawahar [Inde]

Source :

RBID : ISTEX:E96E767CE48405122392E7508C98969E20DA18DE

Abstract

Abstract: This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the on-going activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.

Url:
DOI: 10.1007/11669487_38


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Digitizing a Million Books: Challenges for Document Analysis</title>
<author>
<name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
</author>
<author>
<name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
</author>
<author>
<name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
</author>
<author>
<name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E96E767CE48405122392E7508C98969E20DA18DE</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11669487_38</idno>
<idno type="url">https://api.istex.fr/document/E96E767CE48405122392E7508C98969E20DA18DE/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000213</idno>
<idno type="wicri:Area/Istex/Curation">000210</idno>
<idno type="wicri:Area/Istex/Checkpoint">000A61</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Sankar P:digitizing:a:million</idno>
<idno type="wicri:Area/Main/Merge">001105</idno>
<idno type="wicri:Area/Main/Curation">001088</idno>
<idno type="wicri:Area/Main/Exploration">001088</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Digitizing a Million Books: Challenges for Document Analysis</title>
<author>
<name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
<affiliation wicri:level="4">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Institute for Software Research International, Carnegie Mellon University</wicri:regionArea>
<placeName>
<settlement type="city">Pittsburgh</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author>
<name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Inde</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">E96E767CE48405122392E7508C98969E20DA18DE</idno>
<idno type="DOI">10.1007/11669487_38</idno>
<idno type="ChapterID">38</idno>
<idno type="ChapterID">Chap38</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the on-going activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Inde</li>
<li>États-Unis</li>
</country>
<region>
<li>Pennsylvanie</li>
</region>
<settlement>
<li>Pittsburgh</li>
</settlement>
<orgName>
<li>Université Carnegie-Mellon</li>
</orgName>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
</noRegion>
<name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
</country>
<country name="États-Unis">
<region name="Pennsylvanie">
<name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001088 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001088 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:E96E767CE48405122392E7508C98969E20DA18DE
   |texte=   Digitizing a Million Books: Challenges for Document Analysis
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024